In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random

Poisson Processes

Let's start with a simple homogenous Poisson process. A Poisson process is a continuous-time stochastic process that models the occurence of events within a given time interval. A Poisson process is known as a pure-birth process. The term homogenous refers to the fact that the rate of event occurences is constant over the time domain. The events are assumed to occur with an arrival time modeled by an exponential distribution.

The Poisson process is described by:

$$ P[(N(t + \Delta t) - N(t)) = k] = \frac{(\lambda \Delta t)^k}{k!}e^{-\lambda \Delta t}$$

The above equations describes the probability of $k$ events occuring in the interval $(t, t + \Delta t]$ given $\lambda$, the rate of event occurences given in units of events/$\Delta t$.

I'll mention two approaches for generating events from the Poisson process.

The first approach is similar to the Roulette Wheel sampling method discussed early on. We can build the cumulative density function (CDF) by computing the probabilities for $k = 0, 1, 2, \dots$ events occuring in the time interval $(t, t + \Delta t]$. We then generate a random number from a uniform distribution and find the number of events by finding the matching range of probabilities. Since $k$ could be infinite, we only generate probabilities until we have a match.

There are two potential problems with this approach:

  1. We don't know the exact arrival time of each event -- we only know how many events have occured in the interval. To achieve a fine granularity, we need to use a small value for $\Delta t$.
  2. Finding the number of events requires evaluating the probabilities for all smaller values of $k$.

As a result, this approach ends up being computationally intensive.

For the second approach, we can compute the waiting time $X_n$ of event $n$ by sampling from the exponential distribution:

$$ f(X_n) = (\lambda \Delta t)^k e^{-\lambda \Delta t} $$

The arrival time of $n$ is the sum of all the previous waiting times:

$$ t_n = \sum_{i=1}^n X_i $$

The second approach overcomes both disadvantages of the first approach. It gives exact arrival times and is computationally efficient.

(I am unsure of whether the second approach will hold for non-homogenous Poisson processes.)


In [2]:
class HomogenousPoissonProcess(object):
    def __init__(self, rate=None):
        self._rate = rate
        self._last_arrival = 0.0
        self._last_time_period = 0.0
        
    def sample(self, time_period):
        arrival_times = []
        if self._last_arrival > self._last_time_period:
            arrival_times.append(self._last_arrival)
        self._last_time_period += time_period
        while True:
            waiting_time = random.expovariate(self._rate)
            self._last_arrival += waiting_time
            if self._last_arrival < self._last_time_period:
                arrival_times.append(self._last_arrival)
            else:
                break  
        return arrival_times

In [3]:
rate = 10.0 # 10 events per hour
process = HomogenousPoissonProcess(rate=rate)
arrival_times = process.sample(24.0)
print "%s events" % len(arrival_times)
print "Expected: %s events/hour" % rate
print "Observed: %s events/hour" % (len(arrival_times) / 24.0)
plt.plot(xrange(1, len(arrival_times) + 1), arrival_times)
plt.xlabel("Events")
plt.ylabel("Arrival Times")
plt.title("Homogenous Poisson Process")


229 events
Expected: 10.0 events/hour
Observed: 9.54166666667 events/hour
Out[3]:
<matplotlib.text.Text at 0x1045769d0>

In [3]: